\chapter{Measure spaces}
\label{ch:measure_space}
Here is an outline of where we are going next.
Our \emph{goal} over the next few chapters
is to develop the machinery to state (and in some cases prove)
the law of large numbers and the central limit theorem.
For these purposes, the scant amount of work we did in Calculus 101
is going to be awfully insufficient:
integration over $\RR$ (or even $\RR^n$) is just not going to cut it.

This chapter will develop the theory of ``measure spaces'',
which you can think of as ``spaces equipped with a notion of size''.
We will then be able to integrate over these
with the so-called Lebesgue integral
(which in some senses is almost strictly better than the Riemann one).

\section{Letter connotations}
There are a lot of ``types'' of objects moving forward,
so here are the letter connotations we'll use
throughout the next several chapters.
This makes it easier to tell what the ``type'' of each object
is just by which letter is used.
\begin{itemize}
	\ii Measure spaces denoted by $\Omega$,
	their elements denoted by $\omega$.
	\ii Algebras and $\sigma$-algebras denoted by script $\SA$, $\SB$, \dots.
	Sets in them denoted by early capital Roman $A$, $B$, $C$, $D$, $E$, \dots.
	\ii Measures (i.e.\ functions assigning sets to reals)
	denoted usually by $\mu$ or $\rho$.
	\ii Random variables (functions sending worlds to reals)
	denoted usually by late capital Roman $X$, $Y$, $Z$, \dots.
	\ii Functions from $\RR \to \RR$
	by Roman letters like $f$ and $g$ for pdf's
	and $F$ and $G$ for cdf's.
	\ii Real numbers denoted by lower Roman letters like $x$, $y$, $z$.
\end{itemize}


\section{Motivating measure spaces via random variables}
To motivate \emph{why} we want to construct measure spaces,
I want to talk about a (real) \vocab{random variable},
which you might think of as
\begin{itemize}
	\ii the result of a coin flip,
	\ii the high temperature in Boston on Saturday,
	\ii the possibility of rain on your 18.725 date next weekend.
\end{itemize}

Why does this need a long theory to develop well?
For a simple coin flip one intuitively just thinks
``50\% heads, 50\% tails'' and is done with it.
The situation is a little trickier with temperature
since it is continuous rather than discrete,
but if all you care about is that one temperature,
calculus seems like it might be enough to deal with this.

But it gets more slippery once the variables start to ``talk to'' each other:
the high temperature tells you a little bit about whether it will rain,
because e.g.\ if the temperature is very high it's quite likely to be sunny.
Suddenly we find ourselves wishing we could talk about conditional probability,
but this is a whole can of worms --- the relations
between these sorts of things can get very complicated very quickly.

The big idea to getting a formalism for this is that:
\begin{moral}
	Our measure spaces $\Omega$ will be thought of as a space of entire worlds,
	with each $\omega \in \Omega$ representing a world.
	Random variables are functions from worlds to $\RR$.
\end{moral}
This way, the space of ``worlds'' takes care of all the messy interdependence.

Then, we can assign ``measures'' to sets of worlds:
for example, to be a fair coin means that if you are only interested in
that one coin flip, the ``fraction'' of worlds in
which that coin showed heads should be $\half$.
This is in some ways backwards from what you were told in high-school:
officially, we start with the space of worlds,
rather than starting with the probabilities.

It will soon be clear that there is no way we can assign
a well-defined measure to every single one of the $2^\Omega$ subsets.
Fortunately, in practice, we won't need to,
and the notion of a $\sigma$-algebra will capture the idea
of ``enough measur-\emph{able} sets for us to get by''.

\begin{remark}
	[Random seeds]
	Another analogy if you do some programming:
	each $\omega \in \Omega$ is a \emph{random seed},
	and everything is determined from there.
\end{remark}

\section{Motivating measure spaces geometrically}
So, we have a set $\Omega$ of possible points
(which in the context of the previous discussion
can be thought of as the set of worlds),
and we want to assign a \emph{measure} (think volume)
to subsets of points in $\Omega$.
We will now describe some of the obstacles that we will face,
in order to motivate \emph{how} measure spaces are defined
(as the previous section only motivated \emph{why} we want such things).

If you try to do this na\"{\i}vely,
you basically immediately run into set-theoretic issues.
A good example to think about why this might happen
is if $\Omega = \RR^2$ with the measure corresponding to area.
You can define the area of a triangle as in high school,
and you can then try and define the area of a circle,
maybe by approximating it with polygons.
But what area would you assign to the subset $\QQ^2$, for example?
(It turns out ``zero'' is actually a working answer.)
Or, a unit disk is composed of infinitely many points;
each of the points better have measure zero,
but why does their union have measure $\pi$ then?
Blah blah blah.

We'll say more about this later, but
you might have already heard of the \textbf{Banach-Tarski paradox}
which essentially shows there is no good way that you can assign a
measure to every single subset of $\RR^3$
and still satisfy basic sanity checks.
There are just too many possible subsets of Euclidean space.

However, the good news is that most of these sets are not ones
that we will ever care about,
and it's enough to define measures for certain
``sufficiently nice sets''.
The adjective we will use is \emph{measurable},
and it will turn out that this will be way, way more than good enough
for any practical purposes.

We will generally use $A$, $B$, \dots\ for measurable sets
and denote the entire family of measurable sets by curly $\SA$.

\section{$\sigma$-algebras and measurable spaces}
Here's the machine code.
\begin{definition}
	A \vocab{measurable space} consists of a space $\Omega$ of points,
	and a \vocab{$\sigma$-algebra} $\SA$ of subsets of $\Omega$
	(the ``measurable sets'' of $\Omega$).
	The set $\SA$ is required to satisfy the following axioms:
	\begin{itemize}
		\ii $\SA$ contains $\varnothing$ and $\Omega$.
		\ii $\SA$ should be closed under complements and
		\emph{countable} unions/intersections.
		(Hint on nomenclature: $\sigma$ usually indicates
		some sort of ``countably finite'' condition.)
	\end{itemize}
\end{definition}
(Complaint: this terminology is phonetically confusing,
because it can be confused with ``measure space'' later.
The way to think about is that
``measur\emph{able} spaces have a $\sigma$-algebra, so we \emph{could}
try to put a measure on it, but we \emph{haven't}, yet.'')

Though this definition is how we actually think about it in a few select cases,
for the most part, and we will usually instantiate $\SA$ in practice
in a different way:
\begin{definition}
	Let $\Omega$ be a set, and consider some family of subsets $\SF$ of $\Omega$.
	Then the \vocab{$\sigma$-algebra generated by $\SF$}
	is the smallest $\sigma$-algebra $\SA$ which contains $\SF$.
\end{definition}
As is commonplace in math, when we see ``generated'',
this means we sort of let the definition ``take care of itself''.
So, if $\Omega = \RR$, maybe I want $\SA$ to contain all open sets.
Well, then the definition means it should contain all complements too,
so it contains all the closed sets.
Then it has to contain all the half-open intervals too, and then\dots.
Rather than try to reason out what exactly the final shape $\SA$ looks like
(which basically turns out to be impossible),
we just give up and say ``$\SA$ is all the sets you can get if you start
with the open sets and apply repeatedly union/complement operations''.
Or even more bluntly: ``start with open sets, shake vigorously''.\footnote{As
	will be mentioned later in \Cref{sec:define_lebesgue_measure},
	an explicit construction using transfinite induction is possible.
	That construction is also useful for, for example, proving $|\SB(\RR)| = |\RR|$.}

I've gone on too long with no examples.
\begin{example}
	[Examples of measurable spaces]
	The first two examples actually say what $\SA$ is;
	the third example (most important) will use generation.
	\begin{enumerate}[(a)]
		\ii If $\Omega$ is any set,
		then the power set $\SA = 2^{\Omega}$ is obviously a $\sigma$-algebra.
		This will be used if $\Omega$ is countably finite,
		but it won't be very helpful if $\Omega$ is huge.
		\ii If $\Omega$ is an uncountable set,
		then we can declare $\SA$ to be all subsets of $\Omega$
		which are either countable,
		or which have countable complement.
		(You should check this satisfies the definitions.)
		This is a very ``coarse'' algebra.
		\ii If $\Omega$ is a topological space,
		the \vocab{Borel $\sigma$-algebra}
		is defined as the $\sigma$-algebra generated by all the open sets of $\Omega$.
		We denote it by $\SB(\Omega)$,
		and call the space a \vocab{Borel space}.
		As warned earlier, it is basically impossible to describe
		what it looks like,
		and instead you should think of it as saying
		``we can measure the open sets''.
	\end{enumerate}
\end{example}
\begin{ques}
	Show that the closed sets are in $\SB(\Omega)$ for
	any topological space $\Omega$.
	Show that $[0,1)$ %chktex 9
	is also in $\SB(\RR)$.
\end{ques}

\section{Measure spaces}
\begin{definition}
	Measurable spaces $(\Omega, \SA)$ are then equipped
	with a function $\mu \colon \SA \to [0, +\infty]$
	called the \vocab{measure}, which is required to satisfy
	the following axioms:
	\begin{itemize}
		\ii $\mu(\varnothing) = 0$
		\ii \textbf{Countable additivity}:
		If $A_1$, $A_2$, \dots\ are disjoint sets in $\SA$,
		then \[ \mu\left( \bigsqcup_n A_n \right) = \sum_n \mu(A_n). \]
	\end{itemize}
	The triple $(\Omega, \SA, \mu)$ is called a \vocab{measure space}.
	It's called a \vocab{probability space} if $\mu(\Omega) = 1$.
\end{definition}
\begin{exercise}
	[Weaker equivalent definitions]
	I chose to give axioms for $\SA$ and $\mu$
	that capture how people think of them in practice,
	which means there is some redundancy:
	for example, being closed under complements and unions
	is enough to get intersections, by de Morgan's law.
	Here are more minimal definitions,
	which are useful if you are trying to prove something satisfies them
	to reduce the amount of work you have to do:
	\begin{enumerate}[(a)]
		\ii The axioms on $\SA$ can be weakened
		to (i) $\varnothing \in \SA$ and (ii) $\SA$ is closed under
		complements and countable unions.
		\ii The axioms on $\mu$ can be weakened to
		(i) $\mu(\varnothing) = 0$,
		(ii) $\mu(A \sqcup B) = \mu(A) + \mu(B)$, and
		(iii) for $A_1 \supseteq A_2 \supseteq \cdots$ with $\mu(A_1) < \infty$,
		we have $\mu\left( \bigcap_n A_n \right) = \lim_n \mu(A_n)$.
	\end{enumerate}
\end{exercise}


\begin{remark}
Here are some immediate remarks on these definitions.
\begin{itemize}
	\ii If $A \subseteq B$ are measurable,
	then $\mu(A) \le \mu(B)$ since $\mu(B) = \mu(A) + \mu(B-A)$.
	\ii In particular, in a probability space all measures are in $[0,1]$.
	On the other hand, for general measure spaces we'll allow $+\infty$
	as a possible measure
	(hence the choice of $[0,+\infty]$ as codomain for $\mu$).
	\ii We want to allow at least countable unions / additivity
	because with finite unions it's too hard to make progress:
	it's too hard to estimate the area of a circle
	without being able to talk about limits of countably infinitely many triangles.
	\end{itemize}
\end{remark}
We \emph{don't} want to allow uncountable unions and additivity,
because uncountable sums basically never work out.
In particular, there is a nice elementary exercise as follows:
\begin{exercise}
	[Tricky]
	Let $S$ be an uncountable set of positive real numbers.
	Show that some finite subset $T \subseteq S$ has sum greater than $10^{2019}$.
	Colloquially, ``uncountably many positive reals cannot have finite sum''.
\end{exercise}
So countable sums are as far as we'll let the infinite sums go.
This is the reason why we considered $\sigma$-algebras in the first place.


\begin{example}
	[Measures]
	We now discuss measures on each of the spaces
	in our previous examples.
	\begin{enumerate}[(a)]
		\ii If $\SA = 2^\Omega$ (or for that matter any $\SA$)
		we may declare $\mu(A) = |A|$ for each $A \in \SA$
		(even if $|A| = \infty$).
		This is called the \vocab{counting measure},
		simply counting the number of elements.

		This is useful if $\Omega$ is countably infinite,
		and optimal if $\Omega$ is finite (and nonempty).
		In the latter case, we will often normalize by
		$\mu(A) = \frac{|A|}{|\Omega|}$
		so that $\Omega$ becomes a probability space.

		\ii Suppose $\Omega$ was uncountable
		and we took $\SA$ to be the countable sets and their complements.
		Then
		\[
			\mu(A) = \begin{cases}
				0 & \text{$A$ is countable} \\
				1 & \text{$\Omega \setminus A$ is countable}
			\end{cases}
		\]
		is a measure. (Check this.)

		\ii Elephant in the room:
		defining a measure on $\SB(\Omega)$ is hard even for $\Omega = \RR$,
		and is done in the next chapter.
		So you will have to hold your breath.
		Right now, all you know is that by declaring my \emph{intent}
		to define a measure $\SB(\Omega)$,
		I am hoping that at least every open set will have a volume.
	\end{enumerate}
\end{example}

\section{A hint of Banach-Tarski}
I will now try to convince you that $\SB(\Omega)$
is a necessary concession,
and for general topological spaces like $\Omega = \RR^n$,
there is no hope of assigning a measure to $2^{\Omega}$.
(In the literature, this example is called a Vitali set.)
\begin{example}
	[A geometric example why $\SA = 2^\Omega$ is unsuitable]
	Let $\Omega$ denote the unit circle in $\RR^2$
	and $\SA = 2^\Omega$.
	We will show that any measure $\mu$ on $\Omega$
	with $\mu(\Omega) = 1$ will have undesirable properties.

	Let $\sim$ denote an equivalence relation on $\Omega$
	defined as follows: two points are equivalent
	if they differ by a rotation around the origin by a rational multiple of $\pi$.
	We may pick a representative from each equivalence class,
	letting $X$ denote the set of representatives.
	Then
	\[ \Omega = \bigsqcup_{\substack{q \in \QQ \\ 0 \le q < 2}}
		\left( X \text{ rotated by $q\pi$ radians} \right).  \]
	Since we've only rotated $X$,
	each of the rotations should have the same measure $m$.
	But $\mu(\Omega) = 1$,
	and there is no value we can assign that measure:
	if $m = 0$ we get $\mu(\Omega) = 0$
	and $m > 0$ we get $\mu(\Omega) = \infty$.
\end{example}
\begin{remark}
	[Choice]
	Experts may recognize that picking a representative
	(i.e.\ creating set $X$)
	technically requires the Axiom of Choice.
	That is why, when people talk about Banach-Tarski issues,
	the Axiom of Choice almost always gets honorable mention as well.
\end{remark}

Stay tuned to actually see a construction for $\SB(\RR^n)$
in the next chapter.

\section{Measurable functions}
\label{sec:measurable_functions}
\prototype{For $S \subseteq \Omega$, $\mathbf{1}_S \colon \Omega \to \RR$ is a measurable function if and
only if $S$ is a measurable set.}
In the past, when we had topological spaces,
we considered continuous functions.
The analog here is:
\begin{definition}
	Let $(X, \SA)$ and $(Y, \SB)$ be measurable spaces
	(or measure spaces).
	A function $f \colon X \to Y$ is \vocab{measurable}
	if for any measurable set $S \subseteq Y$ (i.e.\ $S \in \SB$)
	we have $f\pre(S)$ is measurable (i.e. $f\pre(S) \in \SA$).

  In most cases $Y$ is actually a topological space with the Borel $\sigma$-algebra
  (e.g.\ $Y = \RR$) and in that case we can replace ``measurable set $S$''
  with ``open set $S$''.
  You can take this as a standing assumption for the rest of this text.
\end{definition}

Apart from the obvious symmetry with the definition of continuous function,
as we will see in \Cref{sec:lebesgue_int_equivalent_def}, this definition is such that
for a nonnegative function $f \colon \Omega \to \RR_{\geq 0}$,
$\int_\Omega f \; d\mu$ exists if and only if $f$ is measurable.

\begin{remark}
	Note that a measurable function actually does not need to be continuous,
	as the next example shows.
	On the other hand, most functions you actually encounter in practice will be continuous,
	and in that case ware fine.
\end{remark}
\begin{example}
	[Continuous function with non-measurable preimage of measurable set]
	Let $f \colon [0, 1] \to [0, 1]$ be the Devil's Staircase (or Cantor function).
	This function is continuous, and has the property that, let $C \subseteq [0, 1]$ be the Cantor set,
	then $|C| = 0$, yet $f\im(C) = [0, 1]$ with measure $1$.

	Let $g \colon [0, 1] \to [0, 2]$ be defined by $g(x) = f(x) + x$. Then,
	\begin{itemize}
		\ii For each open interval $(a, b)$ that is removed from the Cantor set $C$, then $|g\im((a,
		b))| = |(a, b)|$.
		\ii $g\im(C)$ has measure $1$.
	\end{itemize}

	Note that $g$ is bijective, let $h \colon [0, 2] \to [0, 1]$, $h = g\inv$. Then $h$ is continuous,
	however:
	\begin{itemize}
		\ii $h\pre(C) = g\im(C)$ has measure $1$, so it has some non-measurable subset,
		\ii $C$ has measure $0$, so every subset of $C$ is (Lebesgue) measurable,
		\ii thus, $h\pre(D)$ is non-measurable for some measurable subset $D \subseteq C$.
	\end{itemize}
\end{example}

\begin{proposition}
	[Continuous implies Borel measurable]
	Suppose $X$ and $Y$ are topological spaces
	and we pick the Borel measures on both.
	A function $f \colon X \to Y$
	which is continuous as a map of topological spaces is also measurable.
\end{proposition}
\begin{proof}
	Follows from the fact that pre-images of open sets are open, thus Borel measurable.
\end{proof}

\section{On the word ``almost''}
In later chapters we will begin seeing the phrase ``almost everywhere''
and ``almost surely'' start to come up,
and it seems prudent to take the time to talk about it now.

\begin{definition}
	We say that property $P$ occurs
	\vocab{almost everywhere} or \vocab{almost surely} if the set
	\[ \left\{ \omega \in \Omega \mid \text{$P$ does not hold for $\omega$} \right\} \]
	has measure zero.
\end{definition}

For example, if we say ``$f = g$ almost everywhere''
for some functions $f$ and $g$ defined on a measure space $\Omega$,
then we mean that $f(\omega) = g(\omega)$ for all $\omega \in \Omega$
other than a measure-zero set.

There, that's the definition.
The main thing to now update your instincts on is that
\begin{moral}
	In measure theory,
	we basically only care about things up to almost-everywhere.
\end{moral}
Here are some examples:
\begin{itemize}
	\ii If $f=g$ almost everywhere,
	then measure theory will basically not tell these functions apart.
	For example, $\int_\Omega f \; d\omega = \int_\Omega g \; d\omega$
	will hold for two functions agreeing almost everywhere.
	\ii As another example,
	if we prove ``there exists a unique function $f$ such that so-and-so'',
	the uniqueness is usually going to be up to measure-zero sets.
\end{itemize}
You can think of this sort of like group isomorphism,
where two groups are considered ``basically the same'' when they are isomorphic,
except this one might take a little while to get used to.\footnote{In fact,
	some people will even define functions on measure spaces
	as \emph{equivalence classes} of maps,
	modded out by agreement outside a measure zero set.}

\section{\problemhead}
\begin{dproblem}
	Let $(\Omega, \SA, \mu)$ be a probability space.
	Show that the intersection of countably many sets of measure $1$
	also has measure $1$.
\end{dproblem}

\begin{problem}
	[On countable $\sigma$-algebras]
	\onechili
	Let $\SA$ be a $\sigma$-algebra on a set $\Omega$.
	Suppose that $\SA$ has countable cardinality.
	Prove that $|\SA|$ is finite and equals a power of $2$.
\end{problem}
